<!DOCTYPE html>
<html lang="zh-CN">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>LMArena - 大型语言模型竞技场</title>
    <link href="https://cdn.staticfile.org/font-awesome/6.4.0/css/all.min.css" rel="stylesheet">
    <link href="https://cdn.staticfile.org/tailwindcss/2.2.19/tailwind.min.css" rel="stylesheet">
    <link href="https://fonts.googleapis.com/css2?family=Noto+Serif+SC:wght@400;500;600;700&family=Noto+Sans+SC:wght@300;400;500;700&display=swap" rel="stylesheet">
    <script src="https://cdn.jsdelivr.net/npm/mermaid@latest/dist/mermaid.min.js"></script>
    <style>
        body {
            font-family: 'Noto Sans SC', Tahoma, Arial, Roboto, "Droid Sans", "Helvetica Neue", "Droid Sans Fallback", "Heiti SC", "Hiragino Sans GB", Simsun, sans-serif;
            background: linear-gradient(135deg, #f5f7fa 0%, #c3cfe2 100%);
            min-height: 100vh;
        }
        .hero-gradient {
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
        }
        .card-hover {
            transition: all 0.3s ease;
        }
        .card-hover:hover {
            transform: translateY(-5px);
            box-shadow: 0 20px 40px rgba(0,0,0,0.1);
        }
        .text-gradient {
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            -webkit-background-clip: text;
            -webkit-text-fill-color: transparent;
        }
        .feature-icon {
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            -webkit-background-clip: text;
            -webkit-text-fill-color: transparent;
        }
        .step-number {
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            color: white;
            width: 32px;
            height: 32px;
            display: flex;
            align-items: center;
            justify-content: center;
            border-radius: 50%;
            font-weight: bold;
        }
        .mermaid {
            background: white;
            padding: 2rem;
            border-radius: 1rem;
            box-shadow: 0 10px 30px rgba(0,0,0,0.1);
        }
        .drop-cap {
            float: left;
            font-size: 4rem;
            line-height: 1;
            font-weight: 700;
            margin-right: 0.5rem;
            margin-top: -0.2rem;
            color: #667eea;
            font-family: 'Noto Serif SC', serif;
        }
        .section-divider {
            width: 60px;
            height: 4px;
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            margin: 2rem auto;
            border-radius: 2px;
        }
    </style>
</head>
<body>
    <!-- Hero Section -->
    <section class="hero-gradient text-white py-20 px-6">
        <div class="max-w-6xl mx-auto text-center">
            <h1 class="text-5xl md:text-7xl font-bold mb-6 tracking-tight">
                LMArena
            </h1>
            <p class="text-2xl md:text-3xl mb-4 font-light">
                大型语言模型竞技场
            </p>
            <p class="text-lg md:text-xl max-w-3xl mx-auto opacity-90 leading-relaxed">
                通过匿名盲测方式，让全球AI研究者和开发者共同评估大型语言模型的真实性能，构建最客观的模型排行榜
            </p>
            <div class="mt-10 flex justify-center space-x-6">
                <a href="https://lmarena.ai/" target="_blank" class="bg-white text-purple-700 px-8 py-4 rounded-full font-semibold hover:bg-gray-100 transition duration-300 flex items-center">
                    <i class="fas fa-external-link-alt mr-2"></i>
                    访问平台
                </a>
                <button class="border-2 border-white px-8 py-4 rounded-full font-semibold hover:bg-white hover:text-purple-700 transition duration-300">
                    <i class="fas fa-play-circle mr-2"></i>
                    观看演示
                </button>
            </div>
        </div>
    </section>

    <!-- Problem Section -->
    <section class="py-16 px-6 bg-white">
        <div class="max-w-6xl mx-auto">
            <h2 class="text-4xl font-bold text-center mb-4">
                <span class="text-gradient">它能解决什么问题</span>
            </h2>
            <div class="section-divider"></div>
            
            <div class="prose prose-lg max-w-none">
                <p class="text-gray-700 leading-relaxed mb-6">
                    <span class="drop-cap">在</span>评估大型语言模型时，用户常常面临主观偏差和基准测试的局限性：传统基准如GLUE或MMLU虽标准化，但往往脱离真实应用场景，导致模型在实验室表现优异却在生产环境中失效；手动比较多个模型响应耗时费力，且易受品牌偏好影响；此外，缺乏社区驱动的实时数据，用户难以追踪模型迭代后的实际改进，如从GPT-3.5到GPT-4的跃升。
                </p>
                
                <div class="grid md:grid-cols-3 gap-6 mt-10">
                    <div class="bg-gradient-to-br from-purple-50 to-pink-50 p-6 rounded-xl">
                        <i class="fas fa-eye-slash text-3xl feature-icon mb-4"></i>
                        <h3 class="text-xl font-semibold mb-2">消除偏见</h3>
                        <p class="text-gray-600">盲测机制隐藏模型身份，确保投票基于纯响应质量</p>
                    </div>
                    <div class="bg-gradient-to-br from-blue-50 to-purple-50 p-6 rounded-xl">
                        <i class="fas fa-users text-3xl feature-icon mb-4"></i>
                        <h3 class="text-xl font-semibold mb-2">众包智慧</h3>
                        <p class="text-gray-600">聚合用户反馈形成公开排行榜，提供客观性能指标</p>
                    </div>
                    <div class="bg-gradient-to-br from-pink-50 to-orange-50 p-6 rounded-xl">
                        <i class="fas fa-history text-3xl feature-icon mb-4"></i>
                        <h3 class="text-xl font-semibold mb-2">历史追踪</h3>
                        <p class="text-gray-600">支持保存历史和分享提示，降低重复劳动成本</p>
                    </div>
                </div>
            </div>
        </div>
    </section>

    <!-- Core Features -->
    <section class="py-16 px-6 bg-gray-50">
        <div class="max-w-6xl mx-auto">
            <h2 class="text-4xl font-bold text-center mb-4">
                <span class="text-gradient">核心功能概述</span>
            </h2>
            <div class="section-divider"></div>
            
            <div class="grid md:grid-cols-2 gap-8 mt-10">
                <div class="bg-white p-8 rounded-2xl shadow-lg card-hover">
                    <div class="flex items-start mb-4">
                        <i class="fas fa-random text-3xl text-purple-600 mr-4"></i>
                        <div>
                            <h3 class="text-2xl font-semibold mb-2">盲测比较引擎</h3>
                            <p class="text-gray-600 leading-relaxed">用户输入自定义提示，平台随机分配多个LLM生成响应，并隐藏模型名称供匿名投票。消除认知偏差，让开发者快速识别如Claude在创意写作中优于Gemini的细微优势。</p>
                        </div>
                    </div>
                </div>
                
                <div class="bg-white p-8 rounded-2xl shadow-lg card-hover">
                    <div class="flex items-start mb-4">
                        <i class="fas fa-chart-line text-3xl text-purple-600 mr-4"></i>
                        <div>
                            <h3 class="text-2xl font-semibold mb-2">动态排行榜生成</h3>
                            <p class="text-gray-600 leading-relaxed">基于全球用户投票实时更新模型Elo分数，覆盖写作、编码、多语言等类别。追踪模型演进，例如监控Llama 3的发布对Meta排名的影响。</p>
                        </div>
                    </div>
                </div>
                
                <div class="bg-white p-8 rounded-2xl shadow-lg card-hover">
                    <div class="flex items-start mb-4">
                        <i class="fas fa-bookmark text-3xl text-purple-600 mr-4"></i>
                        <div>
                            <h3 class="text-2xl font-semibold mb-2">提示库与历史保存</h3>
                            <p class="text-gray-600 leading-relaxed">内置社区分享的提示模板，允许账户用户保存聊天记录和投票历史。用户可复用先前提示避免从零开始，加速A/B测试流程。</p>
                        </div>
                    </div>
                </div>
                
                <div class="bg-white p-8 rounded-2xl shadow-lg card-hover">
                    <div class="flex items-start mb-4">
                        <i class="fas fa-plug text-3xl text-purple-600 mr-4"></i>
                        <div>
                            <h3 class="text-2xl font-semibold mb-2">多模型集成支持</h3>
                            <p class="text-gray-600 leading-relaxed">无缝接入主流LLM提供商，如OpenAI、Anthropic、Meta和Google模型。一站式访问，避免切换API密钥的繁琐。</p>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </section>

    <!-- Use Cases -->
    <section class="py-16 px-6 bg-white">
        <div class="max-w-6xl mx-auto">
            <h2 class="text-4xl font-bold text-center mb-4">
                <span class="text-gradient">使用场景</span>
            </h2>
            <div class="section-divider"></div>
            
            <div class="space-y-8 mt-10">
                <div class="bg-gradient-to-r from-purple-50 to-pink-50 p-8 rounded-2xl">
                    <h3 class="text-2xl font-semibold mb-4 flex items-center">
                        <i class="fas fa-briefcase text-purple-600 mr-3"></i>
                        模型选型在产品开发中
                    </h3>
                    <p class="text-gray-700 leading-relaxed">一位AI产品经理需为聊天机器人选择最佳LLM，传统方式是逐一API调用测试数百提示。LMArena允许输入典型用户查询，盲测投票后查看排行榜中Claude 3的胜率高于GPT-4o 15%，直接指导集成决策。这解决了主观选型的风险，缩短从评估到部署的周期约30%。</p>
                </div>
                
                <div class="bg-gradient-to-r from-blue-50 to-purple-50 p-8 rounded-2xl">
                    <h3 class="text-2xl font-semibold mb-4 flex items-center">
                        <i class="fas fa-microscope text-blue-600 mr-3"></i>
                        研究论文的基准验证
                    </h3>
                    <p class="text-gray-700 leading-relaxed">学术研究者比较开源模型如Mistral与闭源如PaLM在多语言任务的表现。平台的历史投票数据显示Mistral在西班牙语翻译中Elo分数达1250，优于PaLM的1180，用户可下载数据补充论文证据。这比自建测试集更高效，解决了小样本偏差问题。</p>
                </div>
                
                <div class="bg-gradient-to-r from-pink-50 to-orange-50 p-8 rounded-2xl">
                    <h3 class="text-2xl font-semibold mb-4 flex items-center">
                        <i class="fas fa-code text-orange-600 mr-3"></i>
                        开发者社区反馈循环
                    </h3>
                    <p class="text-gray-700 leading-relaxed">独立开发者测试自定义微调模型对基准LLM的竞争力。通过分享提示到社区，收集数百票反馈，揭示模型在编码调试场景的弱点。这在开源项目中促进改进迭代，解决了孤立测试的盲点。</p>
                </div>
            </div>
        </div>
    </section>

    <!-- Visualization -->
    <section class="py-16 px-6 bg-gray-50">
        <div class="max-w-6xl mx-auto">
            <h2 class="text-4xl font-bold text-center mb-4">
                <span class="text-gradient">平台架构与流程</span>
            </h2>
            <div class="section-divider"></div>
            
            <div class="mermaid">
                graph TB
                    A[用户输入提示] -->